Skip to content

feat(decide): pure DECIDE core + exhaustive truth-table tests#4

Merged
harshitsinghbhandari merged 6 commits into
feat/lcm-sm-contractsfrom
session/aa-2
May 26, 2026
Merged

feat(decide): pure DECIDE core + exhaustive truth-table tests#4
harshitsinghbhandari merged 6 commits into
feat/lcm-sm-contractsfrom
session/aa-2

Conversation

@harshitsinghbhandari

Copy link
Copy Markdown
Collaborator

Summary

Implements the pure DECIDE core for the LCM + Session Manager lane (PR #3), replacing the stubbed panic(...) bodies in backend/internal/domain/decide/decide.go with real, total, side-effect-free deciders, plus exhaustive table tests.

Stacked on #2 (feat/lcm-sm-contracts) — this PR is based on the contracts branch, not main, so the diff is just the two decide files. Merge #2 first.

Deciders

  • ResolveProbeDecision — kill-intent short-circuits straight to terminal killed; a failed probe (timeout/error) and probe disagreement (runtime/process mismatch, or runtime-dead with recent activity) route to detecting; only runtime-dead + process-dead + no-recent-activity concludes killed. (Honors "failed probe ≠ dead" and "one authority for death".)
  • ResolveOpenPRDecision — the PR pipeline ladder in exact order: ci_failing → changes_requested → approved+mergeable → approved → review_pending → idle-beyond → open.
  • ResolveTerminalPRStateDecision — merged → idle / merged_waiting_decision; closed → idle.
  • CreateDetectingDecision — anti-flap quarantine. Unchanged-evidence counter; StartedAt is preserved across the whole detecting episode so the duration cap is a real wall-clock safety net even when evidence flaps. Escalates to stuck at DetectingMaxAttempts (3) consecutive unchanged ticks or DetectingMaxDuration (5m).
  • HashEvidence — strips ISO datetimes, bare time-of-day, and epoch numbers, then collapses whitespace before SHA-256, so restamped-but-unchanged signals compare equal.

Notes for reviewers

  • No changes to the frozen contract types in domain/ or ports/; no decide input struct needed extending.
  • LifecycleDecision carries no runtime sub-state fields, so the probe decider expresses runtime death via SessionReason (runtime_lost / agent_process_exited) — the LCM is expected to translate that into the persisted runtime substate.
  • DetectingInput.ProposedState is kept for the contract but isn't load-bearing in the output (output state is always detecting or escalated stuck); the reason is driven by ProposedReason.

Test plan

  • gofmt -l . prints nothing
  • go build ./...
  • go vet ./...
  • go test -race ./...
  • 100% statement coverage on internal/domain/decide
  • Every branch of every decider table-tested, plus a consistency test that the open-PR ladder's display Status matches DeriveLegacyStatus over the canonical state it emits, and an end-to-end probe→detecting→stuck escalation flow

🤖 Generated with Claude Code

…tests

Replace the stubbed deciders in domain/decide with real, total,
side-effect-free implementations:

- ResolveProbeDecision: kill-intent short-circuits to terminal; failed
  probes and probe disagreement route to detecting; only runtime-dead +
  process-dead + no-recent-activity concludes killed.
- ResolveOpenPRDecision: the PR pipeline ladder
  (ci_failing > changes_requested > approved+mergeable > approved >
  review_pending > idle-beyond > open).
- ResolveTerminalPRStateDecision: merged -> idle/merged_waiting_decision,
  closed -> idle.
- CreateDetectingDecision: anti-flap quarantine — unchanged-evidence
  counter with StartedAt preserved across the episode so the duration cap
  is a real wall-clock safety net; escalates to stuck at 3 ticks or 5m.
- HashEvidence: strips timestamps/epochs and collapses whitespace before
  hashing so restamped-but-unchanged signals compare equal.

Table tests cover every branch (100% statement coverage), including a
consistency check that the open-PR ladder's display Status matches
DeriveLegacyStatus over the canonical state it emits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@harshitsinghbhandari harshitsinghbhandari left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — pure DECIDE core

Recommendation: Approve, with follow-ups. Verified independently in a throwaway worktree: gofmt clean, go build/vet pass, go test -race green at 100% statement coverage. The core logic is correct, pure, and exhaustively tested. Findings below are about the seam with the (not-yet-built) LCM, one semantic gap, and minor items — none are correctness bugs in the isolated functions.

The two areas called out in the request — both correct

  • Probe branch orderingkill → any-probe-failed → runtime switch. A failed probe never concludes death; disagreements and indeterminate-process quarantine; only runtime-dead + process-dead + no-recent-activity reaches killed.
  • Detecting counter/duration ✓ Preserving StartedAt across an evidence change while resetting the attempt counter is the right call — the 5-min cap is a whole-episode wall-clock safety net, not per-evidence, so a flapping-but-stuck session still escalates. The "duration cap fires even when evidence keeps flapping" test proves it.

Architecture & design (carry into the LCM PR)

  1. Probe decider conflates liveness with session activity state. A healthy runtime returns SessionWorking / ReasonTaskInProgress. When the reaper feeds this through ApplyRuntimeObservation, the LCM must not let a liveness verdict clobber a more specific activity-derived state (e.g. a session in waiting_input set by ApplyActivitySignal). Needs an explicit composition rule for which decider owns SessionState when the runtime is alive. Not a bug here — a contract to document in the LCM PR.
  2. mergeable is unreachable without Approved. The distilled ladder is "approved /none + mergeable → MERGEABLE", but the impl only reaches MERGEABLE on Approved && Mergeable. A mergeable PR on a repo with no required review falls through to PR_OPEN. Either OpenPRInput should express "no review required" or Mergeable should be weighed independently. Confirm intent before the LCM builds the SCMFacts → OpenPRInput mapping.

Integration contract (LCM PR)

  1. Detecting-memory lifecycle. The deciders correctly return Detecting == nil for every non-detecting outcome and non-nil only while quarantined. The LCM must therefore clear the persisted detecting memory (via LifecyclePatch.ClearDetecting) whenever a decision's Detecting is nil — otherwise a stale StartedAt/EvidenceHash leaks into a later episode.

Testing

  1. Extend the consistency cross-check. TestResolveOpenPRDecisionDerivesConsistently (decision.Status == DeriveLegacyStatus(canonical)) is excellent but only covers the open-PR ladder. The probe and terminal deciders are also consistent (hand-verified) — adding them to the same test locks the invariant against drift.

Code quality (minor)

  1. HashEvidence epoch regex \b\d{10,13}\b strips any 10–13 digit run, which would also clobber a legitimate large ID if one ever appears in an evidence string. Low risk today; worth a comment or tightening. (HH:MM without seconds also isn't stripped — fine for the current full-timestamp format.)

Nice work — the truth-table is thorough and the anti-flap semantics are right.

…ency test

Address PR review (aa-1):
- ResolveOpenPRDecision now keys MERGEABLE on Mergeable alone (checked
  before Approved). Mergeability is the authoritative merge gate, so a PR
  on a no-required-review repo no longer falls through to PR_OPEN. The
  approved+mergeable and approved-only cases are unchanged.
- Broaden the derive-consistency test to cover the probe and terminal
  deciders too, not just the open-PR ladder.
- Document the HashEvidence epoch-stripping regex's breadth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@harshitsinghbhandari

Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough review, @harshitsinghbhandari. Addressed in 9920c6d:

#2 — mergeable unreachable without Approved ✅ Fixed. ResolveOpenPRDecision now keys MERGEABLE on Mergeable alone, checked before Approved. Since Mergeability is the authoritative merge gate (it already folds in approval where review is required), a mergeable PR on a no-required-review repo is now surfaced as ready-to-merge instead of falling through to PR_OPEN. The two explicitly-specified cases are preserved: approved && mergeable → MERGEABLE (caught by the mergeable branch first) and approved && !mergeable → APPROVED. Added a mergeable && !approved → MERGEABLE test case.

#4 — extend the consistency cross-check ✅ Done. Renamed to TestDecidersDeriveConsistently; it now asserts decision.Status == DeriveLegacyStatus(canonical) across the open-PR ladder and the probe + terminal deciders. (The terminal none/open default is excluded — it's a documented no-op for misuse, and its PR_OPEN/working shape is intentionally not derive-consistent.)

#5HashEvidence epoch regex breadth ✅ Documented. Added a comment noting \b\d{10,13}\b will also strip a same-width numeric ID, and that evidence strings should keep IDs out to preserve hash fidelity. Left the pattern as-is since evidence is decider-authored.

#1 — probe decider conflates liveness with activity state — Agreed, deferring to the LCM PR. This is a composition-rule question (which decider owns SessionState when the runtime is alive), not a bug in the pure function. Will document an explicit precedence rule so a liveness working verdict can't clobber a more specific activity-derived state (e.g. waiting_input) in ApplyRuntimeObservation.

#3 — detecting-memory lifecycle — Agreed, LCM-PR contract. The deciders already return Detecting == nil for every non-detecting outcome; the LCM must clear persisted detecting memory (LifecyclePatch.ClearDetecting) whenever a decision's Detecting is nil so a stale StartedAt/EvidenceHash can't leak into a later episode. Will wire this when the LCM lands.

Gates still green: gofmt clean, build/vet/test -race, 100% statement coverage.

@harshitsinghbhandari harshitsinghbhandari left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review — 9920c6d

Verified independently (throwaway worktree on 9920c6d): gofmt clean, build/vet pass, go test -race green at 100% coverage. All actionable findings resolved:

  • #2 mergeable-without-approval — correct. Moving Mergeable ahead of Approved but still behind CIFailing/ChangesRequested is a genuine strict superset: every previously-MERGEABLE case is unchanged, the no-required-review case is now surfaced, and CI-fail / changes-requested precedence is preserved. The "authoritative merge gate" comment captures the rationale.
  • #4 consistency cross-check — now spans the probe + terminal deciders (TestDecidersDeriveConsistently), with the terminal none/open no-op reasonably excluded. This locks decider output against DeriveLegacyStatus drift.
  • #5 epoch-regex breadth — documented.

#1 (liveness-vs-activity composition) and #3 (detecting-memory clearing) acknowledged as LCM-PR contract follow-ups — agreed, those belong there.

LGTM. Good, fast turnaround.

Comment thread backend/internal/domain/decide/decide.go Outdated
Comment thread backend/internal/domain/decide/decide.go
harshitsinghbhandari and others added 3 commits May 27, 2026 00:25
…ting helper

Address PR review (aa-1):
- Move the decider input/output type definitions (LifecycleDecision,
  ProbeInput, ProcessLiveness, OpenPRInput, DetectingInput) out of the
  execution file into a dedicated types.go, leaving decide.go for behaviour.
- Expand the detecting() helper doc to spell out that it adapts a probe
  verdict into the shared CreateDetectingDecision anti-flap path so each
  probe branch doesn't re-implement the quarantine counter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clarify the timestamp-stripping block flagged in review: spell out what
each of the three regexes matches (full ISO/RFC3339 datetime, bare
time-of-day, bare unix epoch) and why order matters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the previously-stubbed, pure DECIDE-core logic for the backend lifecycle lane (LCM/SM), and adds comprehensive table-driven tests to lock in the decision truth tables.

Changes:

  • Implemented deterministic deciders for probe reconciliation, PR ladder resolution, terminal PR handling, detecting→stuck escalation, and evidence hashing.
  • Added evidence hashing normalization (timestamp stripping + whitespace collapse) to support stable detecting escalation.
  • Added exhaustive table tests plus consistency checks against domain.DeriveLegacyStatus.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
backend/internal/domain/decide/types.go Defines the DECIDE-core input/output shapes used by all deciders.
backend/internal/domain/decide/decide.go Implements the decider logic, detecting escalation, and evidence hashing.
backend/internal/domain/decide/decide_test.go Adds table-driven tests covering all decision branches and consistency invariants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/internal/domain/decide/decide.go Outdated
Comment thread backend/internal/domain/decide/types.go
… contract

Address Copilot review:
- ResolveOpenPRDecision now sets a stable, timestamp-free Evidence summary
  "<condition> #<num> <url>" for every ladder outcome, consuming the
  previously-unused OpenPRInput.Number/URL identity inputs and making PR
  decisions traceable in logs. Covered by TestResolveOpenPRDecisionEvidence.
- Document LifecycleDecision's zero-value contract: an empty PRState/PRReason
  means "this decider does not address PR — leave unchanged", not PRNone. The
  LCM must map empty PR fields to a nil LifecyclePatch.PR; writing PRNone on a
  probe tick would clobber a live PR. (Pointers were considered but the empty
  sentinel is distinguishable from every valid state and consistent with the
  codebase's value-enum style; LifecyclePatch already owns nil-means-unchanged.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@harshitsinghbhandari harshitsinghbhandari merged commit 217c50e into feat/lcm-sm-contracts May 26, 2026
2 checks passed
neversettle17-101 added a commit that referenced this pull request Jun 3, 2026
…change

Defer findings #4 (isGitRepo case-sensitivity) and #5 (GitChecker seam) out
of this PR. Restores the original exec-based isGitRepo and the New(store)
constructor; removes git.go, git_test.go, and the test-only export shims. The
error-standardization and other findings are unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
neversettle17-101 added a commit that referenced this pull request Jun 3, 2026
…ayers (#95) (#96)

* refactor(backend): LLD maintainability fixes in controllers/service layers

Addresses the high + medium severity findings from the LLD review of
backend/internal/httpd and backend/internal/service (#95):

1. Controllers no longer import internal/session_manager. Session sentinel
   errors are now *domain.ServiceError values carrying their own HTTP mapping,
   so the controller translates them with one generic errors.As — no
   cross-package sentinel imports.
2. One error pattern across services: project.Error is now an alias of the
   shared domain.ServiceError, and session_manager sentinels use it too. A
   single writeServiceError replaces the per-resource error switches.
3. Clean-orchestrator business logic moved out of the controller into
   session.Service.SpawnOrchestrator(ctx, projectID, clean).
4. isGitRepo no longer treats case-different paths as equal on case-sensitive
   filesystems; case-insensitive compare is gated to darwin/windows via samePath.
5. Project repo check sits behind an injectable GitChecker, so the service is
   testable without a real git binary.
6. httpd exports only the production constructors (NewWithDeps,
   NewRouterWithControl); the 3 test-only wrappers are removed and the
   "router with empty deps" convenience moved to an unexported test helper.

Closes #95

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(backend): standardize service errors on internal/httpd/errors

Replace the domain.ServiceError approach with a REST-API-scoped error package
and a single envelope renderer, per review feedback:

- Add internal/httpd/errors (package errors, aliased apierr): one structured
  Error type with semantic Kinds (Internal/Invalid/NotFound/Conflict) and
  constructors. Imports nothing, so any layer can depend on it.
- envelope.WriteError is now the single path from a service error to the wire
  APIError, and the only place a Kind becomes an HTTP status/word. The
  per-resource writeProjectError/writeSessionError translators are gone.
- Delete domain/errors.go (keeps domain pure of HTTP-flavored kinds) and
  service/project/errors.go (no per-service error files); services build
  errors inline via apierr constructors.
- session_manager sentinels are apierr.Error values (pointer identity still
  works with errors.Is).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* revert(backend): drop GitChecker seam and isGitRepo case-sensitivity change

Defer findings #4 (isGitRepo case-sensitivity) and #5 (GitChecker seam) out
of this PR. Restores the original exec-based isGitRepo and the New(store)
constructor; removes git.go, git_test.go, and the test-only export shims. The
error-standardization and other findings are unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(session): translate engine errors to API errors at the facade

The session_manager is the internal command engine and must not depend on the
REST API error vocabulary. Revert its sentinels to plain errors.New values and
move the engine→API translation into the service/session facade (toAPIError),
which is the correct boundary. Controllers still see apierr.Error and never
import the engine; the engine no longer imports internal/httpd/errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(session): tighten error comments to state what the code does

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* style(envelope): make KindInternal an explicit case in httpStatus

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(apierr): rename package, test SpawnOrchestrator, parity fixes

Address review feedback on PR #96:
- Rename internal/httpd/errors → internal/httpd/apierr (package apierr) so
  importers no longer alias around the stdlib errors package.
- Add a commander seam to session.Service and unit-test the relocated
  clean-orchestrator rule: clean=true kills all active orchestrators before
  spawning; clean=false spawns without kills.
- project.Add: wrap the UpsertProject store error in apierr.Internal for parity
  with its sibling paths (was a raw 500).
- Document that KindInternal is iota's zero value, so a zero-value Error
  defaults to 500.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
neversettle17-101 added a commit that referenced this pull request Jun 12, 2026
…viewer to worker harness

Per PR #197 review feedback:
- Reviewer agent posts its review to the PR itself, so remove the
  ports.PRReviewPoster port, the GitHub review poster, the submit HTTP
  route + DTO, and the service Submit method (#1, #4, #7).
- Trigger spawns the reviewer agent over the worker's worktree with its
  own review prompt, mirroring the session launch flow (resolve agent by
  harness -> argv -> runtime.Create) (#8, #9).
- Default reviewer harness reuses the worker's harness when supported,
  falling back to claude-code; reviewer config stays independent of the
  worker override (#5, #6).
- Drop the `ao review` CLI for this PR's scope (#2, #3).

Regenerated OpenAPI + TS types.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Vaibhaav-Tiwari pushed a commit that referenced this pull request Jun 14, 2026
* feat(review): configurable AO code review backend (V1)

Add per-project configurable code review of a worker's PR. A reviewer
agent runs one-shot over the worker's own worktree and posts its result
to the PR; the worker picks the feedback up through the existing SCM
observer review-nudge path.

- domain: ProjectConfig.reviewers (+ default reviewer harness), Review /
  ReviewRun types and verdict/status vocab.
- storage: review + review_run tables (0011), sqlc queries, store methods.
- service/review: rewrite the in-memory stub as a persisted ReviewService
  (Trigger/Submit/List) with a reviewer Runner over agent resolver +
  runtime; ports.PRReviewPoster implemented on the GitHub adapter.
- http: session-scoped routes POST /sessions/{id}/reviews/trigger,
  POST .../submit, GET .../reviews; regenerated OpenAPI + TS types.
- cli: ao review trigger|submit|list.
- frontend: adapt ReviewDashboard to the per-worker reviews API.

Closes #192

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): address review — drop submit/poster/CLI, default reviewer to worker harness

Per PR #197 review feedback:
- Reviewer agent posts its review to the PR itself, so remove the
  ports.PRReviewPoster port, the GitHub review poster, the submit HTTP
  route + DTO, and the service Submit method (#1, #4, #7).
- Trigger spawns the reviewer agent over the worker's worktree with its
  own review prompt, mirroring the session launch flow (resolve agent by
  harness -> argv -> runtime.Create) (#8, #9).
- Default reviewer harness reuses the worker's harness when supported,
  falling back to claude-code; reviewer config stays independent of the
  worker override (#5, #6).
- Drop the `ao review` CLI for this PR's scope (#2, #3).

Regenerated OpenAPI + TS types.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(review): restore ao review submit (records verdict+body in AO)

Per maintainer request, bring back `ao review submit`. AO records the
reviewer's verdict and body on the review_run and marks the pass complete;
it does not post to GitHub — the reviewer agent posts its review to the PR
itself.

- storage: add review_run.body (0011), persist via Insert/UpdateReviewRunResult.
- service: restore Submit (no SCM poster) storing verdict + body.
- http: restore POST /sessions/{id}/reviews/submit + SubmitReviewInput.
- cli: ao review submit [worker] --verdict --body (worker from arg/--session/$AO_REVIEW_WORKER).
- runner: reviewer prompt instructs posting to GitHub and recording via ao review submit.

Regenerated OpenAPI + TS types.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): move reviewer runner to its own package; sharpen prompt

Per PR #197 review:
- Move the concrete reviewer runner out of the service layer into a new
  internal/review_runner package (package reviewrunner), beside other
  orchestration packages like session_manager. The service keeps only the
  Runner interface + RunSpec it depends on; the agent-resolver + runtime
  launch flow lives in review_runner.
- Sharpen the reviewer prompt: tell the agent to diff against the PR base,
  focus on high-confidence findings, post via `gh pr review`, and record
  the result with `ao review submit`; review-only (no commits/edits).
- Add unit tests for the runner.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): simplify review_run schema; provider-agnostic reviewer prompt

Per PR #197 review:
- review_run: status default 'running' (drop 'pending'), drop CHECK
  constraints on status/verdict, drop the updated_at column and the
  session/iteration index. Propagated through queries, domain, store,
  service, and tests.
- Reviewer prompt no longer hardcodes GitHub/gh commands — it instructs the
  agent to use whatever review tooling the provider offers, keeping the
  flow extensible across SCM providers.

Regenerated sqlc + OpenAPI/TS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): launch reviewer before persisting the run

Trigger now spawns the reviewer agent first and then writes the review_run
with a status derived from the launch outcome (running on success, failed
if it never started), instead of inserting a running row and correcting it
to failed afterwards.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): pluggable reviewer registry distinct from worker harnesses

Reviewers are now their own pluggable adapter set, separate from the worker
agent registry — adding a reviewer (claude-code today, greptile tomorrow) is
a one-line registration that does not widen the worker harness vocabulary,
and a worker harness does not automatically become a valid reviewer.

- domain.ReviewerHarness: a distinct vocabulary (AllReviewerHarnesses) with
  its own IsKnown; ReviewerConfig/Review/ReviewRun use it. ResolveReviewerHarness
  reuses the worker harness only when it is itself a supported reviewer, else
  falls back to claude-code.
- ports.Reviewer: a reviewer-specific contract (ReviewCommand → argv + env)
  that models one-shot / non-prompt CLIs natively instead of forcing every
  reviewer through the worker's interactive GetLaunchCommand(Prompt:...).
- internal/adapters/reviewer: a separate registry + resolver (mirrors the
  worker agent registry) with the claude-code reviewer adapter, which owns the
  review prompt and reuses the worker claude-code launch construction.
- review_runner resolves via the reviewer registry (not the worker
  AgentResolver) and merges AO_REVIEW_WORKER into the adapter's env.
- daemon wires the reviewer resolver. Registry/domain parity is test-enforced.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(review): cover run-scoped reviewer submit

* fix(api): update generated review submit schema

* refactor(review): split core engine (internal/review) from API service

Move the review orchestration (Trigger/Submit/List, run-id generation,
deps, RunSpec/Runner, sentinels) into a transport-independent core package
internal/review (Engine). internal/service/review is now a thin API-flow
boundary: the controller-facing Manager interface + a Service that delegates
to the engine + error re-exports.

This keeps the service layer to API concerns and lets the same engine back a
future in-process CLI trigger without going through HTTP. review_runner now
depends on the core package; daemon builds the engine and wraps it in the
service. No API/schema changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(review): commit-aware trigger, reviewer handle for UI, no env vars

Reworks the review trigger lifecycle and drops env-based coupling:

- review_run gains target_sha (the reviewed commit) and drops iteration.
  A repeat trigger for the same PR head short-circuits to the existing run.
- review gains reviewer_handle_id: the live reviewer pane's runtime handle,
  reused across passes and exposed in the reviews API so the UI can attach
  its terminal over /mux.
- Trigger flow: if a live reviewer pane exists and a new commit arrived,
  message it to re-review; otherwise spawn a fresh reviewer. The run is
  recorded only after the reviewer is launched.
- No environment variables: the reviewer adapter embeds the explicit
  `ao review submit --session <w> --run <id>` command in the spawn prompt
  and the re-review message. CLI submit requires --run/--session (no env
  fallbacks).
- Merge review_runner into internal/review as a Launcher (spawn/notify/alive).
- Trigger returns 201 for a new pass, 200 when reusing an existing run.

Regenerated sqlc + OpenAPI/TS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): author the reviewer prompt centrally, not in the adapter

Mirror the worker model (session_manager builds the prompt; adapters just
place it via LaunchConfig.Prompt). The reviewer prompt now lives in
internal/review/prompt.go and is passed through ports.ReviewInvocation.Prompt;
the claude-code reviewer adapter just feeds inv.Prompt to its launch command
and returns it as the re-review message. One-shot CLI reviewers may ignore it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): split reviewer prompt into system+task, mirroring buildSpawnTexts

Mirror session_manager.buildSpawnTexts for the reviewer: a standing role goes
in the system prompt, the per-pass task (PR/commit + exact `ao review submit`
command) goes in the user prompt. internal/review/prompt.go now returns
(prompt, systemPrompt); both flow through ports.ReviewInvocation and the
claude-code adapter places them via LaunchConfig{Prompt, SystemPrompt}. The
re-review message reuses the per-pass prompt (role already established in the
running pane).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Vaibhaav <user@example.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants